An approach for Supporting OpenMP on the Intel SCC
نویسندگان
چکیده
The advent of the Single-chip Cloud Computer (SCC) chip in the many-core realm imposes challenges to programmers. From a programmer’s perspective is desirable to use the shared memory paradigm, employing high-level parallel programming abstractions such as OpenMP. In this paper we discuss our ongoing efforts to support OpenMP on SCC. Specifically, we focus on the following three key aspects in our approach: i) Investigating an implementation that is aware of the memory hierarchy. ii) How to handle OpenMP shared variables. iii) Efficiently implementing synchronization (i.e., barrier) constructs by leveraging SCC hardware support. To meet this need, we propose effective barrier synchronization implementations for OpenMP on the SCC. In particular, we present an efficient evaluation of the overhead associated with integrating barrier algorithms that is required for OpenMP run-time libraries on such a machine. Our initial experimental results show significant performance improvement up to 98% for 48 cores.
منابع مشابه
Pragmatic Performance Portability with OpenMP 4.x. In OpenMP: Memory, Devices, and Tasks: 12th International Workshop on OpenMP, IWOMP
In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM’s OpenMP 4.5 Clang branch (clang-ykt) targeting an NVIDIA K20x, the Intel compiler 16 targeting an Intel Xeon Phi Knights Landing, and GCC 6.1 targeting an AMD APU. We outline the...
متن کاملBDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores
This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic synchronization, and executes OpenMP-Ss tasks on a non cache-coherent architecture. We design a runtime that uses fast on-chip intercore communication with small m...
متن کاملBareMichael: A Minimalistic Bare-metal Framework for the Intel SCC
The many-core Intel SCC processor is one of a class of emerging, highly parallel computer architectures. Intel provides a modern Linux kernel which, running on the SCC as a separate instance per core, is able to load and launch user applications. However, there is a lack of open-source tools to facilitate development of “bare-metal” SCC applications – applications that are run directly on the c...
متن کاملEvaluation of Directive-based Performance Portable Programming Models
We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how ...
متن کاملCluster-level tuning of a shallow water equation solver on the Intel MIC architecture
The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: 1. Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; 2. Tuning the inter-operation of OpenMP and MPI to partition t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013